Replication information for �Is the Time Allocated to Review Patent Applications Inducing Examiners to Grant Invalid Patents?: Evidence from Micro-Level Application Data�

By: Michael D. Frakes and Melissa F. Wasserman

Overview:
We are posting the two Stata .do files that are required to (1) transform the source data files (in ASCII format) to the analytical data set and (2) estimate specifications using the analytical data set and produce the results presented in the paper.  We are also posting a .do file using saved estimates from analysis.do to produce the figures presented in the paper.

In this README file, we will list each such data file (source files and analytical data set), list the relevant variables in the source files, and the list the relevant .do files. 

While we are posting the various source files, we are also posting the two key analytical data sets: (1) analytical_application.txt and (2) analytical_patented, which can immediately be used in connection with the regression .do file: analysis.do.  

Note: our analysis was performed using Stata 13.

Names of Source Data Files:
(1) transaction_history.txt

(a) NOTES: each observation in this file is a given recorded event within a given patent application (across all applications in the sample)�e.g., a record indicating that the application was docketed to an examiner, a record indicating that a non-final rejection was made, and so on and so forth.  

(2) applications.txt

(a) NOTES: each observation in this file a given application from the USPTO�s PAIR database (originally obtained via https://www.google.com/googlebooks/uspto-patents-pair.html).

(3) rejections.txt

(a) NOTES: each observation in this file is a given rejection type from a given application, across all applications in our sample in which at least 1 rejection was extended.  If an application had 3 different types of rejections throughout its prosecution, this data file would register three different observations for that application.  If a given application had more than one rejection through its prosecution of a given rejection type, this file would only register one observation for that application.  





(4) 201502_TPF_USPTO.txt and 201502_TPF_Core.txt

(a) NOTES: these are the source files for the triadic patent family data.  These files were obtained from the OECD.  We have provided the documentation given to us by the OECD in connection with these files.

(5) litigation.txt

(a) NOTES: each observation in this file is a given issued patent out of the set of patents that appeared in the Lex Machina patent litigation database, based on post-2000 filing dates.  Patents that were never litigated do not appear in this source dataset.  

(6) examiners_first_year.txt

(a) NOTES: each observation is a given application in the USPTO Pair sample, with information on the hiring year for the examiner assigned to the relevant application (left censored at 1992).  We are also enclosing 2 separate source files that can be used to recreate this particular source file: examiners_gau.txt (annual PTO examiner rosters dating back to 1993) and examiners.txt (an alternative annual PTO examiner roster file dating back to 1992; having alternatives can be useful in dealing with challenges in matching roster names to examiner names in PAIR records).  

(7) excite.txt

(a) NOTES: each observation is a given backward citation (indicating the patent number of the cited patent) within a given indicated patent.  


(8) examiner_gs.txt  
	
(a) NOTES: this files contains year-specific rosters of examiners at the PTO, with information on the names and General Schedule grade levels of each examiner working at the PTO in the indicated year.

(9) position_factor.txt

(a) NOTES: this file contains year-specific rosters of GS-13 and GS-15 examiners, indicating the �position factors� for each GS-13 and GS-15 examiner in the indicated year.  Position factors bear on the schedule times allocated to each examiner to review applications.  Unlike other GS levels, GS-13 and GS-15 each have two different types of time allocations. 

(10) entity_size.txt

(a) NOTES: each observation is a given application in the PTO PAIR database, with information bearing on the entity-size status (large or small) for the applicant, as such term is used for purposes of fee setting by the PTO (generally less than 500 employees)

(11) class_match.txt

(a) NOTES: this file provides a cross-walk between PTO classes and NBER technology categories and sub-categories

(12) filings.txt

(a) NOTES: this file is organized at the PTO Class / year level, providing information on the number of filings in the indicated class and indicated year.

(13) productivity.txt

(a) NOTES: this file is organized at the PTO class level and indicates, for each class, the amount of time allocated to GS-12 examiners. 

(14) nonpatentpriorart.txt

(a) NOTES: this file is organized at the issued patent level and provides counts of the number of non-patent prior art cited by the applicants and by the examiners.







Variables

Variables in applications.txt source file (description in parentheses):

(1) file_name (PTO application number)

(2) filing_or_371_c_month (application filing month)

(3) filing_or_371_c_day (application filing day)
(4) filing_or_371_c_year (application filing year)
(5) application_type (e.g., plant, utility, etc.; note, the applications file was compiled in an effort to focus comprehensively on utility applications and should not be used for the purposes of analyzing non-utility applications).
(6) examiner_name (name of main examiners associated with application�i.e., assistant examiner if both an assistant and a primary worked on the application or primary examiner if only a primary worked on the application)
(7) group_art_unit (Art Unit to which application was assigned)
(8) confirmation_number (not used in paper)
(9) attorney_docket_number (not used in paper)
(10) class_subclass (PTO classification and sub-classification number)
(11) first_named_inventor (not used in paper)
(12) customer_number (not used in paper)
(13) status (Last known status update on application�e.g., application abandoned, patented case, payment of renewal fee, etc.; this variable is helpful in ascertaining whether or not application was allowed)
(14) status_month (month associated with status update indicated in status variable)
(15) status_day (day associated with status update indicated in status variable)
(16) status_year (year associated with status update indicated in status variable)
(17) location (not used in paper) 
(18) location_month (not used in paper) 
(19) location_day (not used in paper) 
(20) location_year (not used in paper) 
(21) earliest_publication_no (not used in paper)
(22) earliest_publication_month (not used in paper)
(23) earliest_publication_day (not used in paper)
(24) earliest_publication_year (not used in paper)
(25) patent_number (PTO assigned patent number, if application results in publication of patent)
(26) issue_date_of_patent_month (month of issuance of patent, if patent results from application)
(27) issue_date_of_patent_day (day of issuance of patent, if patent results from application)
(28) issue_date_of_patent_year (year of issuance of patent, if patent results from application)
(29) title_of_invention (applicataion title, not used in paper)
(30) entity_size (large- or small-entity status of applicant, as used by PTO in assessing fees)

Variables in transaction_history.txt source file (description in parentheses):

(1) file_name (PTO Application Number)
(2) month (month associated with transaction of record)
(3) day (day associated with transaction of record)
(4) year (year associated with transaction of record)
(5) transaction_code (code signifying event of record connected with given observation; see enclosed file for list transaction codes)

Variables in rejection.txt source file (description in parentheses):

(1) file_name (PTO Application Number)
(2) rejection (type of rejection associated with observation, by statutory provision�e.g., 35USC101)

Variables in 201502_TPF_USPTO.txt and 201502_TPF_Core.txt � see the enclosed documentation provided by the OECD regarding this triadic patent family dataset

Variables in litigation.txt (description in parentheses)

(1) patent_number (PTO assigned number for issued patent)
(2) title (title of issued patent)
(3) asserted (the number of times the issued patents was asserted in litigation following January 1, 2000, through the end of 2013)
(4) opencases (not used in paper)
(5) infringement (not used in paper)
(6) invalidity (variable indicating whether or not the patent was invalidated at court)
(7) unenforceability (not used in paper) 
(8) damages (not used in paper)


Variables in examiners_first_year (description in parentheses)

(1) file_name (PTO Application Number)
(2) first_year2 (First year employed at PTO; left censored at 1992)

Variables in examiners_gau.txt (description in parentheses) (note, the first of two alternative sets of examiner rosters)

(1) year (examiner roster year, indicating that the stated examiner worked for the PTO on the indicated year; rosters date back to 1993)
(2) gau (the Group Art Unit to which the indicated examiner belonged in the indicated year)
(3) examiner (the name of the examiner)

Variables in examiners.txt (description in parentheses) (note, the second of two alternative sets of examiner rosters)

(1) year (examiner roster year, indicating that the indicated examiner worked for the PTO on the indicated year; rosters date back to 1992)
(2) name (name of the examiner)
(3) examiner_name (redundant)
(4) code4 (broad assignment group code within PTO; not used in paper)
(5) code6 (narrower assignment group code within PTO, largely indicating group art unit to which the indicated examiner is assigned in the indicated year; not used in paper)
 
Variables in excite.txt (description in parentheses)

(1) patent (PTO Patent Number for citing patent)
(2) cited (PTO Patent Number for cited patent)
(3) excite (indicator variable for whether the cited patent was cited by the examiner, as opposed to the applicant
(4) issyear (year in which the cited patent was issued)

Variables in examiner_gs.txt (description in parentheses)

(1) examiner (name of the examiner)
(2) year (roster year associated with the record)
(3) pay_code (not used in paper)
(4) grade (General Schedule level of the indicated examiner in the indicated year)

Variables in position_factor.txt (description in parentheses)

(1) year (roster year associated with the record)
(2) exam (name of the examiner)
(3) role (not used in paper)
(4) grade (GS-13 or GS-15)
(5) factor (position factor for GS-13 and GS-15)

Variables in entity_size.txt (description in parentheses)

(1) file_name (PTO application number)
(2) entity_size (LARGE vs. SMALL entity status)

Variables in class_match.txt (description in parentheses)

(1) class (PTO Patent Class)
(2) cat (NBER technology category)
(3) subcat (NBER technology sub-category)

Variables in filings.txt (description in parentheses)

(1) year (year associated with the filing count) 
(2) class (PTO classification)
(3) serialized_filings (count of new filings and continuation filings)
(4) rce_r129_cpa_filings (count of RCE filings)
(5) total_filings (count of all filings)

Variables in productivity.txt (description in parentheses)

(1) class (PTO classification)
(2) count (the number of hours allocated to GS-12 examiners for examination in the indicated class)

Variables in nonpatentpriorart.txt (description in parentheses)

(1) patent_number (PTO patent number)
(2) nonpatent_by_applicant (number of nonpatent prior art citations provided by applicant for the indicated patent)
(3) nonpatent_by_examiner (number of nonpatent prior art citations provided by examiner for the indicated patent)
(4) nonpatent_unknown (number of nonpatent prior art citations whose origination is unknown�rarely greater than 0)



Description of .do files

(1) setup.do: using the source files described above, this .do file produces two analytical data sets for use with the regression file discussed below: (1) analytical_application.dta (for use with regressions on application sample and (2) analytical_patented.dta (for use with regressions on patented sample).  
(2) analysis.do: using analytical_application.dta and analytical_patented.dta, this .do file runs the various regressions reported in the paper and appendix.
(3) figures_final.do, this .do file produces the various figures presented in the paper, using saved estimates from analysis.do


